Introduction to STL for Macintosh Programmers

More Modern C++

Good afternoon, I would like to start by introducing myself. I am Jon Kalb and I work for Liberty Software a Macintosh-only Software Consulting firm in the bay area.

Welcome to "More Modern C++." Although I have written shipping code that makes heavy used of the Standard Template Library, I don’t hold myself out as an expert. My intention is simple to build on what we discussed in my earlier session. I would like to see the hands of anyone that was in that earlier session…thanks.

If you attended either of my sessions last year, you know that I spent a significant amount of time in them discussing where Metrowerks had not yet implemented the standard completely and how to work around these limitations. Fortunately for all of us, I don’t need to do that this year. Metrowerks has essentially completely implemented the standard. If you have problems now, they are yours.

My first topic is the template object auto_ptr<> and the question that I want to ask is, "If you aren’t using auto_ptr<> then how are you writing exception safe code?"

I can remember the first time I was introduced to stack based objects. This was before PowerPlant made them deservedly popular with their implementation of their "St" classes.

My initial reaction was "Wow!" This is very useful for a whole set of situations. We can restore state after locking handles, opening files, setting grafport attributes, and on and on.

My joy was just the convenience and safety of not having to remember to restore state before returning. At that time I wasn’t thinking about exceptions. Now that I am using exceptions in my code, my mantra for auto_ptr<> is, "Learn it; love it; live it."

The auto_ptr<> template class is just a wrapper that calls delete on the pointer that it owns in its destructor. (It also has member operator overloads that allow it to transparently support pointer syntax. It really is just set and forget.) Let’s look at an example of why auto_ptr makes life good.

<noautoptr.html>

Here we have a class that contains pointers to objects that it owns. Imagine that our attempt to create the object pointed to by "b" throws an exception. What happens to "a?" It leaks! The destructor is not called for objects that throw while being constructed.

This type of situation comes up often. Not just in constructors, but any where that we are working with pointers that must be deleted and an exception might be thrown.

<noautoptralt.html>

I have also provided an alternative implementation. This solves the problem, but look how much more work it is. Now I’m not opposed to work, but more work means more chances to forget something. I deliberately "forgot" something in this example. Have you spotted it?

That’s right. I need to re-throw the exception in the catch block.

Consider for a moment if "a" and "b" where const pointers. Not pointers to const objects, but const pointers. That would mean that they can only be initialized in a constuctor’s initializers. Now our alternative fix isn’t just wordy and error prone; it’s also useless. This problem isn’t unsolvable without auto_ptr<>, of course, but the code will get messier. If we use auto_ptr<> the code gets cleaner.

<autoptr.html>

Here is our solution using auto_ptr<>. Hey that is exactly what we wrote the first time, except that we have declared "a" and "b" to be const auto_ptr<>s and our destructor got simpler.

The syntax for using auto_ptr<>s is designed to be transparent. Use them just as if they were pointers.

The code works just like we want it to. It’s exception safe because these pointers will always take care of themselves when they go out of scope. In the LFile example, sourceFile will not leak and the file will be closed whether we hit an exception or not.

One thing to be aware of with auto_ptr<> is that it has owner semantics. What does that mean? It means that anytime the value of an auto_ptr<> is passed to another auto_ptr<>–this can be from copy construction, the assignment operator, passing an argument, or returning a value from a function–the original auto_ptr<> is modified–it no longer owns the pointer and isn’t responsible for deleting it, the new auto_ptr<> accepts that responsibility. This unusual semantics means that auto_ptrs<> cannot be used in any of the new containers.

Although auto_ptr<> isn’t the be-all and end-all of smart pointers it is a valuable addition to your toolbox. With almost no overhead at all we converted potentially dangling pointers into objects that clean up after themselves.

Image that we are working with objects that are expensive to copy–either because they are large or just because they have non-trivial constructors or destructors. In this case it makes sense to put pointers to these objects in the standard container rather than copies of the objects. Since we are using Modern C++ we want to use smart pointers like auto_ptr<>, but since the only smart pointer in the standard, auto_ptr<> cannot be used in the new standard containers, we have to look elsewhere. Elsewhere, in this case, is www.boost.org. boost has a collection of smart pointers. Here is an example of using one of them.

<boost.cp>

In this example we’ll just pretend that ints are expensive to copy. You wouldn’t ever really store smart pointers to int in this way.

What I have done here is define the "less than" operator for the boost shared_ptr<> to be the "less than" operator of the dereference pointers.

Now instead of a container of expensive-to-copy objects, we have a container of pointers to these objects. Now we can sort the objects without moving (and therefore copying) them.

So the question is where are the are the allocated objects freed? On line two. When the list is deleted, one of the references is removed for the objects, but they are still referenced so they stick around until the stack-based smart pointers are destroyed.

I also wanted to touch on one point that Darin made in his session on memory strategies. This example is not legal C++ can you see why?

<illegal.html>

Of course. Since we don’t know the internal representation of std::vector<> we can’t assume that it is contiguous memory. But it turns out that we can. This will be an acceptable standard.

I invite you to read comp.lang.c++.moderated. It is valuable to for both formal questions and informal does and don’ts.

Now we come to what is my favorite container class. The standard string class. Every framework has a string class so why do we need yet another string class?

Perhaps we don’t, but this one is well designed, potentially performant, and, best of all for contractors like myself, it is, or soon will be universal.

I am not going to show you the entire set of public functions; I’m not even going to show you all of the different constructors. I just want to show you the high points.

<STL9.cp>

The string type’s template nature is hidden by the "string" typedef. The result is that strings really come very close to looking and behaving as if they were fundamental language types. They support string/array syntax and STL container syntax. There is no char * conversion operator, but the c_str() member function returns a const char * so we can continue to use all our old routines that depend on C strings.

I should mention that the c_str() function only promises that the returned C string is valid until the next operation on the string container.

As I said, strings are becoming my favorite container and it's just because they are so darn convenient and easy to use. You don't have to worry about over running buffers or leaking memory. You really do treat them just like ints. You can assign, pass them, them return them with out worring.

Now I know what you are thinking. "There is no such thing as a free string. What is the catch?" You are right to be skeptical. Alex Stepanov, the creative force behind STL points out that library users really do need to know more than just a library's API. Without some kind of understanding about performance issues, a library isn't going to be used. Consider the MacOS Resource Manager. The API makes it look like a database. But it wasn't intended for use as a database and it won't perform well if that is how you use it. The API along doesn't tell you all you need to know.

The standard, of course, doesn't tell us anything about the implementation of the standard strings or any of the standard classes because it doesn't want to mandate any one approach. The drawback for us is that this means that there is little or no documentation for the implementation. There are now quite a few books that deal with STL, but almost all of them avoid documenting the implementation for the same reason. There is no one implementation and anything that is written about implementation may be wrong with respect to different platforms, different compilers, and even different release of the same compiler.

With this as a caveat, I have done a little digging on the Metrowerks string class implementation.

The first important thing to notice is that the data is always contained in a single contiguous buffer that is null terminated. The implication of this is that the c_str() function is trivial; just return the address of this buffer. Lesson one: string conversion to C strings is cheap.

The next thing we find is that the buffer is grown in 32 byte increments. The total space overhead other than this buffer is about twenty-four bytes so your space overhead per string is in the range of twenty-five to fifty-five bytes. For someone that is used to using Str255's for strings, this is actually quite an improvement in space performance.

Since growing the string buffer involves a new allocation, a copy, and a delete of the old buffer, it is better to anticipate your string size requirement and call the class’s reserve() member function if that is possible.

Lesson two: strings are relatively space efficient and it is better to grow your strings in one fell swoop than incrementally.

The third thing we learn about the implementation is that the library only copies the string buffer if it has to. In other words, if you initialize a string and then create another string from it, either by copy construction or assignment, then both classes share the same buffer and the buffer is reference counted. This implementation strategy is called copy-on-write.

This can be both a space and time win if you are likely to have more that one string with the same value. Is that likely in the real world? Perhaps. Imagine that you are maintaining a list of files with strings containing the files’ names for each of several different platforms. You might have many cases where the strings would be exactly the same. Lesson three: if you are working with strings that might be identical, create them from each other to maximize buffer sharing.

The down side to the copy-on-write strategy is that, in a multithread environment, we might see a significant increase in locking because it is harder to be certain about when writes might happen.

The last thing that we learn from looking at the source is that once you do anything that might allow you alter a string, the string immediately makes its own copy of the string buffer and puts itself into a state (I call it the don't-share state) that will never again allow its buffer to be shared. Why? Well if, for example, you call begin() on a string, you have an iterator to the first char of the string and you might change it (unless it is a const string, in which case this doesn't apply).

If you change the first char of this string then it wouldn't be equal to any other string that is sharing the buffer so it has to make its own copy. Also, once the new copy is made and you have an iterator to the first char, then you could change that first char at any time in the future, so it won't share its buffer with any other string to which it is assigned or from which it is constructed.

Lesson four: if you are working with strings whose values are acquired from other strings, don't do anything that might modify a string, unless that is your intention. This is much more subtle than it sounds.

<STL10.cp>

In this example strings "a" and "b" start out sharing a buffer and end up with each one having its own buffer. The question is: On which line does the buffer copy happen?

The answer is on line one. Why? The subscript operator to a non-const string returns a reference to a char in the string. Although this code treats this value as const, the string's subscript operator function doesn't have any way of knowing how you will use the reference, so it must assume that you will modify the char value and it makes its own copy of the buffer.

How can we avoid copying the buffer when all we want is read access? Well we could try casting the string to a const string, but that won't work. Casting the string to a const string actually constructs a temporary const string from our string. This may not be as bad as it sounds, because the buffer itself will be reference counted and not copied (unless the orignal string is in the don't-share state), but it's not a very good solution, because we don't really have a good way of knowing if the original string is in the don't-share state. A better solution is found on line three. Since the c_str() member function returns a const char *, you can't modify the string, so the buffer is never copied. Remember c_str() is cheap.

If I were planning to implement an application that was a heavy string processor, I would want to look very hard at any candidate for a string base class, but for general purpose application string handling (whatever that is) the standard string class seems to have adequate space and time performance and great safety and portability potential. Learn it; love it; live it.

OK, you have one more objection, right. No string class is worth anything to a Macintosh programmer if it doesn't support conversion to and from Pascal strings.

Clearly, the standard string class doesn't support Pascal string neither is it likely to in the future. But adding such support ourselves is not a terrible burden.

<STL8.cp>

In this example I’ve created a light-weight template class that can be created on the stack to convert from a Pascal string to a standard string. I used a template so that you can decide which size string you want to put on the stack. If you did something like this you would, of course, put it into a namespace right? With the exception of main(), everything should be in a namespace.

This example also shows how easy it is to create standard strings from Pascal strings. I will concede that this is a bit wordy and could easily lead to a typo. I will leave a simple C style macro or a fancy template as an exercise you to enjoy.

Learn it; love it; live it.

If you have any question, now is the time.